Model Selection

Knowledge Distilled BERT

# Knowledge Distilled BERT

Bert L12 H240 A12

A variant of the BERT model pre-trained using knowledge distillation technology, with a hidden layer dimension of 240 and 12 attention heads, suitable for masked language modeling tasks.

Large Language Model

Bert L12 H256 A4

A lightweight BERT model pretrained using knowledge distillation techniques, with a hidden layer dimension of 256 and 4 attention heads, suitable for masked language modeling tasks.

Large Language Model

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase